Skip to content

feat: stateless watcher implementation#1

Open
imajus wants to merge 1 commit into
mainfrom
feat/initial-implementation
Open

feat: stateless watcher implementation#1
imajus wants to merge 1 commit into
mainfrom
feat/initial-implementation

Conversation

@imajus
Copy link
Copy Markdown
Member

@imajus imajus commented May 13, 2026

First implementation pass of the cognee-watcher per DESIGN.md. Parked pending the Cognee deployment so we don't lose the work.

Scope

  • src/cognee_watcher/ modules: config (pydantic-settings env validation), log (JSON structured logs), encoding (path↔name bijection: __U, /__), client (async REST wrapper with tenacity retries on 5xx / 429 / transport errors), watcher (awatch loop, debounce, op routing, single cognify per burst), reconcile (boot + periodic drift-repair sweep), main (CLI entry, SIGTERM handling, wiring).
  • tests/ — 27 tests via pytest + respx (HTTP mocking) + in-memory fake client. Encoding round-trips, REST client error paths, watcher routing (add / update / delete / batch / ignore-globs / collapse).
  • pyproject.toml (hatchling, deps: httpx, watchfiles, tenacity, pydantic-settings, python-json-logger), Dockerfile, .env.example, README.md quickstart.

Verified locally

  • pytest -q → 27 passed
  • ruff check . → clean
  • cognee-watcher boots and hard-fails with a clear error on missing required env

Not yet verified (blocks merge)

  • End-to-end against a real Cognee instance — deployment is WIP. Open assumptions to confirm once it's up:
    • GET /api/v1/datasets/{ds}/data?name=… filters server-side (client falls back to scanning anyway)
    • PATCH /api/v1/update accepts dataId + datasetId as multipart form fields alongside the file
    • Response payload shape for /add / /update / /data (current code tolerates both raw list and {data: […]} envelope)

Done when

  • A real Cognee instance is reachable from a test container, an add → modify → delete cycle is observed end-to-end, and the corresponding entities/edges appear / supersede / vanish in the dataset.

- Async REST client wrapping /add, /update, /delete, /cognify, and dataset
  listing with name lookup; respects 5xx/429 with tenacity exponential backoff.
- Watcher loop using watchfiles awatch + per-burst debounce, collapses
  add->delete sequences and emits a single /cognify per dataset per batch.
- Reconciliation sweep on boot and on a configurable interval to repair drift
  from missed events.
- Path-to-name encoding bijection (replace _ with _U, then / with __).
- Env-var configuration via pydantic-settings; hard-fails on missing required.
- Structured JSON logs via python-json-logger.
- Dockerfile and CLI entrypoint (cognee-watcher / python -m cognee_watcher).
- Tests cover encoding round-trips, REST client via respx, and operation
  routing via an in-memory fake client. 27 tests, ruff clean.

Not yet tested end-to-end against a real Cognee instance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant